Due to computational and storage efficiencies of compact binary codes,hashing has been widely used for large-scale similarity search. Unfortunately,many existing hashing methods based on observed keyword features are noteffective for short texts due to the sparseness and shortness. Recently, someresearchers try to utilize latent topics of certain granularity to preservesemantic similarity in hash codes beyond keyword matching. However, topics ofcertain granularity are not adequate to represent the intrinsic semanticinformation. In this paper, we present a novel unified approach for short textHashing using Multi-granularity Topics and Tags, dubbed HMTT. In particular, wepropose a selection method to choose the optimal multi-granularity topicsdepending on the type of dataset, and design two distinct hashing strategies toincorporate multi-granularity topics. We also propose a simple and effectivemethod to exploit tags to enhance the similarity of related texts. We carry outextensive experiments on one short text dataset as well as on one normal textdataset. The results demonstrate that our approach is effective andsignificantly outperforms baselines on several evaluation metrics.
展开▼